428 research outputs found
Dictionary matching in a stream
We consider the problem of dictionary matching in a stream. Given a set of
strings, known as a dictionary, and a stream of characters arriving one at a
time, the task is to report each time some string in our dictionary occurs in
the stream. We present a randomised algorithm which takes O(log log(k + m))
time per arriving character and uses O(k log m) words of space, where k is the
number of strings in the dictionary and m is the length of the longest string
in the dictionary
Online Detection of Repetitions with Backtracking
In this paper we present two algorithms for the following problem: given a
string and a rational , detect in the online fashion the earliest
occurrence of a repetition of exponent in the string.
1. The first algorithm supports the backtrack operation removing the last
letter of the input string. This solution runs in time and
space, where is the maximal length of a string generated during the
execution of a given sequence of read and backtrack operations.
2. The second algorithm works in time and space,
where is the length of the input string and is the number of
distinct letters. This algorithm is relatively simple and requires much less
memory than the previously known solution with the same working time and space.
a string generated during the execution of a given sequence of read and
backtrack operations.Comment: 12 pages, 5 figures, accepted to CPM 201
A Very Large Array Search for 5 GHz Radio Transients and Variables at Low Galactic Latitudes
We present the results of a 5 GHz survey with the Very Large Array (VLA) and the expanded VLA, designed to search for short-lived (≾1 day) transients and to characterize the variability of radio sources at milli-Jansky levels. A total sky area of 2.66 deg^2, spread over 141 fields at low Galactic latitudes (b≅6-8 deg), was observed 16 times with a cadence that was chosen to sample timescales of days, months, and years. Most of the data were reduced, analyzed, and searched for transients in near real-time. Interesting candidates were followed up using visible light telescopes (typical delays of 1-2 hr) and the X-ray Telescope on board the Swift satellite. The final processing of the data revealed a single possible transient with a peak flux density of f_ν≅2.4 mJy. This implies a transient's sky surface density of κ(f_ν > 1.8 mJy) = 0.039^(+0.13 +0.18)_(–0.032,–0.038) deg^(–2) (1σ, 2σ confidence errors). This areal density is roughly consistent with the sky surface density of transients from the Bower et al. survey extrapolated to 1.8 mJy. Our observed transient areal density is consistent with a neutron star's origin for these events. Furthermore, we use the data to measure the source variability on timescales of days to years, and we present the variability structure function of 5 GHz sources. The mean structure function shows a fast increase on ≈1 day timescale, followed by a slower increase on timescales of up to 10 days. On timescales between 10 and 60 days, the structure function is roughly constant. We find that ≳30% of the unresolved sources brighter than 1.8 mJy are variables at the >4σ confidence level, presumably mainly due to refractive scintillation
Fast Algorithm for Partial Covers in Words
A factor of a word is a cover of if every position in lies
within some occurrence of in . A word covered by thus
generalizes the idea of a repetition, that is, a word composed of exact
concatenations of . In this article we introduce a new notion of
-partial cover, which can be viewed as a relaxed variant of cover, that
is, a factor covering at least positions in . We develop a data
structure of size (where ) that can be constructed in time which we apply to compute all shortest -partial covers for a
given . We also employ it for an -time algorithm computing
a shortest -partial cover for each
Efficient Seeds Computation Revisited
The notion of the cover is a generalization of a period of a string, and
there are linear time algorithms for finding the shortest cover. The seed is a
more complicated generalization of periodicity, it is a cover of a superstring
of a given string, and the shortest seed problem is of much higher algorithmic
difficulty. The problem is not well understood, no linear time algorithm is
known. In the paper we give linear time algorithms for some of its versions ---
computing shortest left-seed array, longest left-seed array and checking for
seeds of a given length. The algorithm for the last problem is used to compute
the seed array of a string (i.e., the shortest seeds for all the prefixes of
the string) in time. We describe also a simpler alternative algorithm
computing efficiently the shortest seeds. As a by-product we obtain an
time algorithm checking if the shortest seed has length at
least and finding the corresponding seed. We also correct some important
details missing in the previously known shortest-seed algorithm (Iliopoulos et
al., 1996).Comment: 14 pages, accepted to CPM 201
Covering Problems for Partial Words and for Indeterminate Strings
We consider the problem of computing a shortest solid cover of an
indeterminate string. An indeterminate string may contain non-solid symbols,
each of which specifies a subset of the alphabet that could be present at the
corresponding position. We also consider covering partial words, which are a
special case of indeterminate strings where each non-solid symbol is a don't
care symbol. We prove that indeterminate string covering problem and partial
word covering problem are NP-complete for binary alphabet and show that both
problems are fixed-parameter tractable with respect to , the number of
non-solid symbols. For the indeterminate string covering problem we obtain a
-time algorithm. For the partial word covering
problem we obtain a -time algorithm. We
prove that, unless the Exponential Time Hypothesis is false, no
-time solution exists for either problem, which shows
that our algorithm for this case is close to optimal. We also present an
algorithm for both problems which is feasible in practice.Comment: full version (simplified and corrected); preliminary version appeared
at ISAAC 2014; 14 pages, 4 figure
Palindromic Decompositions with Gaps and Errors
Identifying palindromes in sequences has been an interesting line of research
in combinatorics on words and also in computational biology, after the
discovery of the relation of palindromes in the DNA sequence with the HIV
virus. Efficient algorithms for the factorization of sequences into palindromes
and maximal palindromes have been devised in recent years. We extend these
studies by allowing gaps in decompositions and errors in palindromes, and also
imposing a lower bound to the length of acceptable palindromes.
We first present an algorithm for obtaining a palindromic decomposition of a
string of length n with the minimal total gap length in time O(n log n * g) and
space O(n g), where g is the number of allowed gaps in the decomposition. We
then consider a decomposition of the string in maximal \delta-palindromes (i.e.
palindromes with \delta errors under the edit or Hamming distance) and g
allowed gaps. We present an algorithm to obtain such a decomposition with the
minimal total gap length in time O(n (g + \delta)) and space O(n g).Comment: accepted to CSR 201
- …